Skip to content

docs: add Android LLM runner page and HuggingFace #19611

Open
omkar-334 wants to merge 6 commits into
pytorch:mainfrom
omkar-334:docs-hf
Open

docs: add Android LLM runner page and HuggingFace #19611
omkar-334 wants to merge 6 commits into
pytorch:mainfrom
omkar-334:docs-hf

Conversation

@omkar-334
Copy link
Copy Markdown

@omkar-334 omkar-334 commented May 15, 2026

Summary

  1. New docs/source/llm/run-on-android.md, a Java reference for the executorch-android AAR runner. Same shape as run-on-ios.md. Covers LlmModule, the LlmModuleConfig builder, LlmGenerationConfig, the LlmCallback methods, load/stop/resetContext, and the image/audio prefill variants. Points at LlamaDemo.

  2. Added run-on-android to the LLM toctree in working-with-llms.md, sitting between the Qualcomm page and iOS.

  3. In getting-started.md, swapped the two GitHub example links for the in-docs Android and iOS pages so users stay in the docs.

  4. Added a tip admonition to using-executorch-export.md under Model Preparation, sending HF Hub users to export-llm-optimum.md before the manual flow.

  5. Cleaned up export-llm-optimum.md. Removed the leftover "Method 1" framing since only the CLI path is documented, bumped the orphaned subheadings up a level, and pointed the Running on Device links at the new Android page and the existing iOS page (sample apps kept inline).

Fixes #8790

cc @mergennachin @AlannaBurke @larryliu0820 @cccclai @helunwencser @jackzhxng @byjlw

omkar-334 added 5 commits May 15, 2026 11:52
Mirrors the structure of run-on-ios.md for the executorch-android AAR.
Covers LlmModule, LlmModuleConfig builder, LlmGenerationConfig,
LlmCallback (onResult/onStats/onError), load/stop/resetContext, and
multimodal prefill (images via int[]/ByteBuffer/float[],
prefillNormalizedImage, prefillAudio, prefillRawAudio).
Slots the new Android page between the Qualcomm guide and run-on-ios so
it appears in the LLM section sidebar.
…ch#8790)

Replaces github.com/meta-pytorch/executorch-examples links for Android
and iOS with the in-docs run-on-android.md and run-on-ios.md pages so
the Running section stays inside the docs.
Surfaces the Hugging Face export path from the main Model Export and
Lowering page via a tip admonition under Model Preparation, pointing
users to llm/export-llm-optimum.md before the manual export walkthrough.
…ytorch#8790)

Drops the stale Export Methods / Method 1 framing (only the CLI method
is documented) and promotes the now-orphaned h4 headings up one level.
Updates the Running on Device section to link the new in-docs Android
page and existing iOS page, with the LlamaDemo and etLLM sample apps
preserved inline.
Copilot AI review requested due to automatic review settings May 15, 2026 06:25
@omkar-334 omkar-334 requested a review from mergennachin as a code owner May 15, 2026 06:25
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented May 15, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19611

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 15, 2026
@github-actions github-actions Bot added docathon-2026 medium Medium Difficulty for issues as part of PyTorch Docathon H1 2026 module: doc Issues related to documentation, both in docs/ and inlined in code module: llm Issues related to LLM examples and apps, and to the extensions/llm/ code module: user experience Issues related to reducing friction for users triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels May 15, 2026
@github-actions
Copy link
Copy Markdown

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds first-party documentation for running LLMs on Android via the executorch-android AAR and improves the Hugging Face (Optimum ExecuTorch) discovery flow across the LLM docs.

Changes:

  • Added a new Android LLM runner guide (run-on-android.md) documenting LlmModule, LlmModuleConfig, LlmGenerationConfig, callbacks, and multimodal prefill APIs.
  • Updated LLM docs navigation to include the Android guide and adjusted “Getting Started” running links to point to in-doc pages.
  • Improved Hugging Face export guidance by adding an Optimum tip in the export docs and cleaning up the Optimum export page (headings + device-running links).

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
docs/source/using-executorch-export.md Adds a tip directing Hugging Face Hub users to the Optimum ExecuTorch export flow.
docs/source/llm/working-with-llms.md Inserts run-on-android into the LLM toctree.
docs/source/llm/run-on-android.md New Android Java runtime guide for LlmModule + configs + multimodal prefill APIs.
docs/source/llm/getting-started.md Updates “Running” links to point to in-doc Android/iOS pages.
docs/source/llm/export-llm-optimum.md Renames/restructures CLI export section and updates “Running on device” links to new docs pages.
Comments suppressed due to low confidence (2)

docs/source/llm/run-on-android.md:113

  • LlmGenerationConfig docs list maxNewTokens and warming as supported generation parameters, but LlmModule.generate(String, LlmGenerationConfig, ...) currently only reads seqLen, echo, temperature, numBos, and numEos (it never uses maxNewTokens / warming). This makes the documentation misleading because setting those fields has no effect. Consider either updating the doc to call out which fields are currently honored, or updating the Java binding/native call to plumb maxNewTokens/warming through if supported by the underlying runner.
For full control over generation parameters, use `LlmGenerationConfig`:

```java
LlmGenerationConfig genConfig = LlmGenerationConfig.create()
    .seqLen(2048)
    .temperature(0.8f)
    .echo(false)
    .build();

module.generate("Once upon a time", genConfig, callback);

LlmGenerationConfig exposes echo, maxNewTokens, seqLen, temperature, numBos, numEos, and warming. Defaults match the C++ GenerationConfig documented in Running LLMs with C++.

**docs/source/llm/run-on-android.md:164**
* In the normalized-image `ByteBuffer` example, after writing floats into `floatBuffer` the buffer position will typically be at the end, so calling `prefillNormalizedImage(floatBuffer, ...)` will fail validation due to insufficient `remaining()` bytes. The example should reset the position (e.g., `flip()`/`rewind()`) after filling the buffer, similar to the raw-byte example above.

ByteBuffer floatBuffer = ByteBuffer
.allocateDirect(3 * 336 * 336 * Float.BYTES)
.order(ByteOrder.nativeOrder());
// fill floatBuffer with normalized values, then:
module.prefillNormalizedImage(floatBuffer, 336, 336, 3);


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

LlmModule module = new LlmModule(config);
```

Available load modes are `LOAD_MODE_FILE`, `LOAD_MODE_MMAP` (default), `LOAD_MODE_MMAP_USE_MLOCK`, and `LOAD_MODE_MMAP_USE_MLOCK_IGNORE_ERRORS`. Available model types are `MODEL_TYPE_TEXT`, `MODEL_TYPE_TEXT_VISION`, and `MODEL_TYPE_MULTIMODAL`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. docathon-2026 medium Medium Difficulty for issues as part of PyTorch Docathon H1 2026 module: doc Issues related to documentation, both in docs/ and inlined in code module: llm Issues related to LLM examples and apps, and to the extensions/llm/ code module: user experience Issues related to reducing friction for users triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Document HuggingFace Integration

2 participants